On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning
نویسندگان
چکیده
As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain a population of policy, whereas the traditional Q-learning obtains predictive rewards to indicate fitness of the evolved policies. GAQ-Learning solves the hidden state problem using a novel hybrid method, in which the historical information is incorporated with the current belief state to find the optimal policy. The experiments conducted on benchmark datasets show that the proposed methodology is superior to other state-of-the-art POMDP approximate methods.
منابع مشابه
Safe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies compar...
متن کاملTraining a real-world POMDP-based Dialogue System
Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system ...
متن کاملExperimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Satinder Singh AT &T Labs-Research 180 Park Avenue Florham Park, NJ 07932 [email protected] Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no long...
متن کاملSolving POMDP by On-Policy Linear Approximate Learning Algorithm
This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) problem. The proposed algorithm is devised to provide a policy-making framework for Network Management Systems (NMS) which is in essence an engineering application without an exact model. The algorithm consists of two phases. Firstly, the model is estimated and policy...
متن کاملActive gesture recognition using partially observable Markov decision processes
We present a foveated gesture recognition system that guides an active camera to foveate salient features based on a reinforcement learning paradigm. Using vision routines previously implemented for an interactive environment, we determine the spatial location of salient body parts of a user and guide an active camera to obtain images of gestures or expressions. A hiddenstate reinforcement lear...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007